154 research outputs found

    progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

    Get PDF
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

    inGeno – an integrated genome and ortholog viewer for improved genome to genome comparisons

    Get PDF
    BACKGROUND: Systematic genome comparisons are an important tool to reveal gene functions, pathogenic features, metabolic pathways and genome evolution in the era of post-genomics. Furthermore, such comparisons provide important clues for vaccines and drug development. Existing genome comparison software often lacks accurate information on orthologs, the function of similar genes identified and genome-wide reports and lists on specific functions. All these features and further analyses are provided here in the context of a modular software tool "inGeno" written in Java with Biojava subroutines. RESULTS: InGeno provides a user-friendly interactive visualization platform for sequence comparisons (comprehensive reciprocal protein – protein comparisons) between complete genome sequences and all associated annotations and features. The comparison data can be acquired from several different sequence analysis programs in flexible formats. Automatic dot-plot analysis includes output reduction, filtering, ortholog testing and linear regression, followed by smart clustering (local collinear blocks; LCBs) to reveal similar genome regions. Further, the system provides genome alignment and visualization editor, collinear relationships and strain-specific islands. Specific annotations and functions are parsed, recognized, clustered, logically concatenated and visualized and summarized in reports. CONCLUSION: As shown in this study, inGeno can be applied to study and compare in particular prokaryotic genomes against each other (gram positive and negative as well as close and more distantly related species) and has been proven to be sensitive and accurate. This modular software is user-friendly and easily accommodates new routines to meet specific user-defined requirements

    Precise detection of rearrangement breakpoints in mammalian chromosomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related species. Contrary to current methods which search for synteny blocks and simply return what remains in the genome as breakpoints, we propose to go further and to investigate the breakpoints themselves in order to refine them.</p> <p>Results</p> <p>Given some reliable and non overlapping synteny blocks, the core of the method consists in refining the regions that are not contained in them. By aligning each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Since this method requires as input synteny blocks with some properties which, though they appear natural, are not verified by current methods for detecting such blocks, we further give a formal definition and provide an algorithm to compute them.</p> <p>The whole method is applied to delimit breakpoints on the human genome when compared to the mouse and dog genomes. Among the 355 human-mouse and 240 human-dog breakpoints, 168 and 146 respectively span less than 50 Kb. We compared the resulting breakpoints with some publicly available ones and show that we achieve a better resolution. Furthermore, we suggest that breakpoints are rarely reduced to a point, and instead consist in often large regions that can be distinguished from the sequences around in terms of segmental duplications, similarity with related species, and transposable elements.</p> <p>Conclusion</p> <p>Our method leads to smaller breakpoints than already published ones and allows for a better description of their internal structure. In the majority of cases, our refined regions of breakpoint exhibit specific biological properties (no similarity, presence of segmental duplications and of transposable elements). We hope that this new result may provide some insight into the mechanism and evolutionary properties of chromosomal rearrangements.</p

    Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.</p> <p>Results</p> <p>We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives.</p> <p>Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96).</p> <p>Conclusion</p> <p>Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p

    Dynamics of Genome Rearrangement in Bacterial Populations

    Get PDF
    Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes

    Comparative Geno-Plasticity Analysis of Mycoplasma bovis HB0801 (Chinese Isolate)

    Get PDF
    Mycoplasma bovis pneumonia in cattle has been epidemic in China since 2008. To investigate M. bovis pathogenesis, we completed genome sequencing of strain HB0801 isolated from a lesioned bovine lung from Hubei, China. The genomic plasticity was determined by comparing HB0801 with M. bovis strain ATCC® 25523™/PG45 from cow mastitis milk, Chinese strain Hubei-1 from lesioned lung tissue, and 16 other Mycoplasmas species. Compared to PG45, the genome size of HB0801 was reduced by 11.7 kb. Furthermore, a large chromosome inversion (580 kb) was confirmed in all Chinese isolates including HB0801, HB1007, a strain from cow mastitis milk, and Hubei-1. In addition, the variable surface lipoproteins (vsp) gene cluster existed in HB0801, but contained less than half of the genes, and had poor identity to that in PG45, but they had conserved structures. Further inter-strain comparisons revealed other mechanisms of gene acquisition and loss in HB0801 that primarily involved insertion sequence (IS) elements, integrative conjugative element, restriction and modification systems, and some lipoproteins and transmembrane proteins. Subsequently, PG45 and HB0801 virulence in cattle was compared. Results indicated that both strains were pathogenic to cattle. The scores of gross pathological assessment for the control group, and the PG45- and HB0801-infected groups were 3, 13 and 9, respectively. Meanwhile the scores of lung lesion for these three groups were 36, 70, and 69, respectively. In addition, immunohistochemistry detection demonstrated that both strains were similarly distributed in lungs and lymph nodes. Although PG45 showed slightly higher virulence in calves than HB0801, there was no statistical difference between the strains (P>0.05). Compared to Hubei-1, a total of 122 SNP loci were disclosed in HB0801. In conclusion, although genomic plasticity was thought to be an evolutionary advantage, it did not apparently affect virulence of M. bovis strains in cattle

    Comparative genomics of European Avian Pathogenic E. coli (APEC)

    Get PDF
    Background Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. However, the diversity between isolates remains poorly understood. Here, a total of 272 APEC isolates collected from the United Kingdom (UK), Italy and Germany were characterised using multiplex polymerase chain reactions (PCRs) targeting 22 equally weighted factors covering virulence genes, R-type and phylogroup. Following these analysis, 95 of the selected strains were further analysed using Whole Genome Sequencing (WGS). Results The most prevalent phylogroups were B2 (47%) and A1 (22%), although there were national differences with Germany presenting group B2 (35.3%), Italy presenting group A1 (53.3%) and UK presenting group B2 (56.1%) as the most prevalent. R-type R1 was the most frequent type (55%) among APEC, but multiple R-types were also frequent (26.8%). Following compilation of all the PCR data which covered a total of 15 virulence genes, it was possible to build a similarity tree using each PCR result unweighted to produce 9 distinct groups. The average number of virulence genes was 6-8 per isolate, but no positive association was found between phylogroup and number or type of virulence genes. A total of 95 isolates representing each of these 9 groupings were genome sequenced and analysed for in silico serotype, Multilocus Sequence Typing (MLST), and antimicrobial resistance (AMR). The UK isolates showed the greatest variability in terms of serotype and MLST compared with German and Italian isolates, whereas the lowest prevalence of AMR was found for German isolates. Similarity trees were compiled using sequencing data and notably single nucleotide polymorphism data generated ten distinct geno-groups. The frequency of geno-groups across Europe comprised 26.3% belonging to Group 8 representing serogroups O2, O4, O18 and MLST types ST95, ST140, ST141, ST428, ST1618 and others, 18.9% belonging to Group 1 (serogroups O78 and MLST types ST23, ST2230), 15.8% belonging to Group 10 (serogroups O8, O45, O91, O125ab and variable MLST types), 14.7% belonging to Group 7 (serogroups O4, O24, O35, O53, O161 and MLST type ST117) and 13.7% belonging to Group 9 (serogroups O1, O16, O181 and others and MLST types ST10, ST48 and others). The other groups (2, 3, 4, 5 and 6) each contained relatively few strains. However, for some of the genogroups (e.g. groups 6 and 7) partial overlap with SNPs grouping and PCR grouping (matching PCR groups 8 (13 isolates on 22) and 1 (14 isolates on 16) were observable). However, it was not possible to obtain a clear correlation between genogroups and unweighted PCR groupings. This may be due to the genome plasticity of E. coli that enables strains to carry the same virulence factors even if the overall genotype is substantially different. Conclusions The conclusion to be drawn from the lack of correlations is that firstly, APEC are very diverse and secondly, it is not possible to rely on any one or more basic molecular or phenotypic tests to define APEC with clarity, reaffirming the need for whole genome analysis approaches which we describe here. This study highlights the presence of previously unreported serotypes and MLSTs for APEC in Europe. Moreover, it is a first step on a cautious reconsideration of the merits of classical identification criteria such as R typing, phylogrouping and serotyping

    Phage Encoded H-NS: A Potential Achilles Heel in the Bacterial Defence System

    Get PDF
    The relationship between phage and their microbial hosts is difficult to elucidate in complex natural ecosystems. Engineered systems performing enhanced biological phosphorus removal (EBPR), offer stable, lower complexity communities for studying phage-host interactions. Here, metagenomic data from an EBPR reactor dominated by Candidatus Accumulibacter phosphatis (CAP), led to the recovery of three complete and six partial phage genomes. Heat-stable nucleoid structuring (H-NS) protein, a global transcriptional repressor in bacteria, was identified in one of the complete phage genomes (EPV1), and was most similar to a homolog in CAP. We infer that EPV1 is a CAP-specific phage and has the potential to repress up to 6% of host genes based on the presence of putative H-NS binding sites in the CAP genome. These genes include CRISPR associated proteins and a Type III restriction-modification system, which are key host defense mechanisms against phage infection. Further, EPV1 was the only member of the phage community found in an EBPR microbial metagenome collected seven months prior. We propose that EPV1 laterally acquired H-NS from CAP providing it with a means to reduce bacterial defenses, a selective advantage over other phage in the EBPR system. Phage encoded H-NS could constitute a previously unrecognized weapon in the phage-host arms race

    Metabolic Versatility and Antibacterial Metabolite Biosynthesis Are Distinguishing Genomic Features of the Fire Blight Antagonist Pantoea vagans C9-1

    Get PDF
    Smits THM, Rezzonico F, Kamber T, et al. Metabolic Versatility and Antibacterial Metabolite Biosynthesis Are Distinguishing Genomic Features of the Fire Blight Antagonist Pantoea vagans C9-1. PLoS ONE. 2011;6(7): e22247.Background: Pantoea vagans is a commercialized biological control agent used against the pome fruit bacterial disease fire blight, caused by Erwinia amylovora. Compared to other biocontrol agents, relatively little is currently known regarding Pantoea genetics. Better understanding of antagonist mechanisms of action and ecological fitness is critical to improving efficacy. Principal Findings: Genome analysis indicated two major factors contribute to biocontrol activity: competition for limiting substrates and antibacterial metabolite production. Pathways for utilization of a broad diversity of sugars and acquisition of iron were identified. Metabolism of sorbitol by P. vagans C9-1 may be a major metabolic feature in biocontrol of fire blight. Biosynthetic genes for the antibacterial peptide pantocin A were found on a chromosomal 28-kb genomic island, and for dapdiamide E on the plasmid pPag2. There was no evidence of potential virulence factors that could enable an animal or phytopathogenic lifestyle and no indication of any genetic-based biosafety risk in the antagonist. Conclusions: Identifying key determinants contributing to disease suppression allows the development of procedures to follow their expression in planta and the genome sequence contributes to rationale risk assessment regarding the use of the biocontrol strain in agricultural systems

    Sequence of the hyperplastic genome of the naturally competent Thermus scotoductus SA-01

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many strains of <it>Thermus </it>have been isolated from hot environments around the world. <it>Thermus scotoductus </it>SA-01 was isolated from fissure water collected 3.2 km below surface in a South African gold mine. The isolate is capable of dissimilatory iron reduction, growth with oxygen and nitrate as terminal electron acceptors and the ability to reduce a variety of metal ions, including gold, chromate and uranium, was demonstrated. The genomes from two different <it>Thermus thermophilus </it>strains have been completed. This paper represents the completed genome from a second <it>Thermus </it>species - <it>T. scotoductus</it>.</p> <p>Results</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 consists of a chromosome of 2,346,803 bp and a small plasmid which, together are about 11% larger than the <it>Thermus thermophilus </it>genomes. The <it>T. thermophilus </it>megaplasmid genes are part of the <it>T. scotoductus </it>chromosome and extensive rearrangement, deletion of nonessential genes and acquisition of gene islands have occurred, leading to a loss of synteny between the chromosomes of <it>T. scotoductus and T. thermophilus</it>. At least nine large inserts of which seven were identified as alien, were found, the most remarkable being a denitrification cluster and two operons relating to the metabolism of phenolics which appear to have been acquired from <it>Meiothermus ruber</it>. The majority of acquired genes are from closely related species of the Deinococcus-Thermus group, and many of the remaining genes are from microorganisms with a thermophilic or hyperthermophilic lifestyle. The natural competence of <it>Thermus scotoductus </it>was confirmed experimentally as expected as most of the proteins of the natural transformation system of <it>Thermus thermophilus </it>are present. Analysis of the metabolic capabilities revealed an extensive energy metabolism with many aerobic and anaerobic respiratory options. An abundance of sensor histidine kinases, response regulators and transporters for a wide variety of compounds are indicative of an oligotrophic lifestyle.</p> <p>Conclusions</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 shows remarkable plasticity with the loss, acquisition and rearrangement of large portions of its genome compared to <it>Thermus thermophilus</it>. Its ability to naturally take up foreign DNA has helped it adapt rapidly to a subsurface lifestyle in the presence of a dense and diverse population which acted as source of nutrients. The genome of <it>Thermus scotoductus </it>illustrates how rapid adaptation can be achieved by a highly dynamic and plastic genome.</p
    corecore